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ABSTRACT 

The errors on statistics measured in finite galaxy catalogs are exhaustively investi- 
gated. The theory of errors on factorial moments by Szapudi & Colombi (f996) is 
applied to cumulants via a series expansion method. All results are subsequently ex- 
tended to the weakly non-linear regime. Together with previous investigations this 
yields an analytic theory of the errors for moments and connected moments of counts 
in cells from highly nonlinear to weakly nonlinear scales. For nonlinear functions of un- 
biased estimators, such as the cumulants, the phenomenon of cosmic bias is identified 
and computed. Since it is subdued by the cosmic errors in the range of applicability of 
the theory correction for it is inconsequential. In addition, the method of Colombi, Sza- 
pudi & Szalay (1998) concerning sampling effects is generalized, adapting the theory 
for inhomogeneous galaxy catalogs. While previous work focused on the variance only, 
the present article calculates the cross-correlations between moments and connected 
moments as well for a statistically complete description. The final analytic formulae 
representing the full theory are explicit but somewhat complicated. Therefore as a 
companion to this paper we supply a FORTRAN program capable of calculating the 
described quantities numerically. An important special case is the evaluation of the 
errors on the two-point correlation function, for which this should be more accurate 
than any method put forward previously. This tool will be immensely useful in the 
future both for assessing the precision of measurements from existing catalogs, as well 
as aiding the design of new galaxy surveys. To illustrate the applicability of the results 
and to explore the numerical aspects of the theory qualitatively and quantitatively, 
the errors and cross-correlations are predicted under a wide range of assumptions for 
the future Sloan Digital Sky Survey. The principal results concerning the cumulants 
£, Qs and, Q4, is that the relative error is expected to be smaller than 3, 5, and 15 
percent, respectively, in the scale range of lft. -1 Mpc — 10ft- -1 Mpc; the cosmic bias will 
be negligible. 

Key words: keywords large scale structure of the universe - galaxies: clustering - 
methods: numerical - methods: statistical 



1 INTRODUCTION 

According to theories of cosmological structure formation 
small initial fluctuations grew by gravitational amplification. 
In the last decade, higher order statistics emerged as an im- 
portant tool to test both the Gaussianity of initial conditions 
and the gravitational amplification process. These tests are 
a priori possible in the perturbation theory (PT) regime 
where many predictions have been obtained by now (see 
Juszkiewicz & Bouchet 1995; Bernardeau 1996b for recent 
short reviews), or in the nonlinear regime. In both cases, 
they can potentially alleviate the ambiguity of the galaxy 
two-point correlation function when light does not trace 
mass (biasing), thereby shedding light on cosmology as well 



as the physics of galaxy formation (e.g. Fry & Gaztahaga 
1993; Gaztahaga & Frieman 1994; Szapudi 1998b). 

A tight control of the errors is crucial for the interpre- 
tation of higher order measurements from galaxy catalogs. 
A sufficiently general and reliable knowledge of the expected 
errors is all the more timely as new galaxy surveys will come 
online in the near future. Building on the groundwork de- 
scribed in two previous papers, Szapudi, & Colombi (1996, 
hereafter SC), and Colombi, Szapudi & Szalay (1998, here- 
after CSS), the aim of this article is to formulate a coherent 
analytic theory for the errors of moments and connected 
moments of counts in cells in all scale regimes for possibly 
inhomogeneous galaxy surveys. 

There has been several explorations in the past con- 
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centrating mainly on the errors of the two-point correla- 
tion function in real and Fourier space (e.g., Peebles 1980; 
Kaiser 1986; Landy & Szalay 1993; Feldman, Kaiser & Pea- 
cock 1994; Hamilton 1993; Hamilton 1997a, 1997b; Scocci- 
marro, Zaldarriaga fc Hui 1999) or the A-point correlation 



functions (Szapudi & Szalay 1998). The analytic calculation 



of the error on the void probability function is described 
in Colombi, Bouchet & Schaeffer (1995). As moments of 
counts in cells have been the most successful descriptors of 
higher order statistics so far, SC set out to formulate the 
general theory of variances related to counts in cells in a 
finite galaxy catalog. Explicit, analytic formulae were de- 
termined for estimating cosmic errors of the factorial mo- 
ments. The main underlying assumptions were the locally 
Poissonian approximation, and the hierarchical ansatz for 
the higher order correlations. The first consists of neglect- 
ing correlations among parts of overlapping cells, while the 
latter is known to be an excellent approximation in existing 
galaxy catalog (e.g., Groth & Peebles 1977; Fry & Peebles 
1978; Sharp, Bonometto & Lucchin 1984; Szapudi, Szalay & 
Boschan 1992; Meiksin, Szapudi & Szalay 1992; Bouchet et 
al. 1993; Szapudi et al. 1995; Szapudi & Szalay 1997) and 
in iV-body simulations in the highly non- linear regime (e.g., 
Efstathiou et al. 1988; Bouchet et al. 1991; Bouchet & Hern- 
quist 1992; Fry, Melott & Shandarin 1993; Bromley 1994 
Lucchin et al. 1994; Colombi, Bouchet & Schaeffer 1994 
Colombi, Bouchet & Hernquist 1996; Munshi et al. 1999a 
Szapudi et al. 1999d). CSS applied the previously developed 
theory and investigated the effects of variable sampling and 
thereby extended the results for inhomogeneous galaxy sur- 
veys. An exhaustive description of the previous calculations 
would be superfluous here since all details can be found in 
SC. Some of the main concepts and the general framework, 
however, is summarized next. 

Careful examination of the generating functions and 
their expansions yields a unique classification of the errors 
according to their origin and an approximate separation be- 
tween them. Part of the uncertainty on counts in cells is due 
to the finite number of sampling cells, C. It is termed mea- 
surement error and it is proportional to 1/%/C 7 ; therefore it 
can be rendered arbitrarily small. The algorithm of Szapudi 
(1998a) achieves the limit of C — > oo in practice, i.e. the 
measurement errors are absent. 

The rest of the variance, termed cosmic error, is inher- 
ent to the galaxy catalog and cannot be substantially im- 
proved upon except for extending the survey itself. It splits 
further into a trichotomy of finite volume effects, arising 
from the fluctuations on scales larger than the survey, edge 
effects, from the uneven weights given to galaxies in rela- 
tion to survey geometry, and discreteness effects, due to the 
finite number of galaxies tracing the underlying continuous 
random field. To leading order in v/V, these three effects are 
approximately disjoint and the corresponding relative errors 
are proportional to [£(Z/)] 1/2 , (£u/V) 1/2 , and [«/(V\/V*)] 1/2 , 
respectively; £(L) is the integral of the correlation function 
(with some restrictions) over the whole survey area, £ is the 
average correlation function in a cell, N is the average count 
in a cell, k is the order of the statistic, and v and V are the 
volumes of the cell and the survey, respectively. Only the 
discreteness error depends on the number of particles, and 
it disappears in the continuum limit. The separation of these 
effects is only approximate, and depends on the leading or- 



der nature of the calculation. Next to leading order contri- 
butions are presented elsewhere (Colombi et al. 1999a). 

There are further refinements and qualifications to the 
above summarized theory. Edge effects, usually do minant on 
large scales, can be corrected for to s ome extent (Landy & 
Szalay 1993; Szapudi fc Szalay 1998| ). Such a correction is 



always equivalent to a virtual extension of the survey, thus 
it is controversial as often pointed out by "fractalists" . A 
fraction of discreteness effects depends on the geometry of 
t he survey thus can be termed as edge-discreteness effect 
( [Szapudi fc Szalay 1998 ). Finally, finite volume effects over- 
lap slightly with edge effects, even though the appropriate 
splitting of the corresponding integral yields an approximate 
separation. 

The present work generalizes the previous calculations 
for many useful statistics, such as the connected moments 
or cumulants of the probability distribution of counts in 
cells, and extends the validity of the theory into the weakly 
non-linear regime by dropping the hierarchical assumption. 
Moreover, cross correlation matrices for moments and con- 
nected moments are computed as well for statistical com- 
pleteness. To facilitate the practical application of this 
somewhat complicated but fully explicit and analytic the- 
ory, we supply FORTRAN programs to evaluate all the 
(co)variances of moments and cumulants. This should di- 
minish the efforts needed to assess the accuracy of counts 
in cells measurements in present and future galaxy cata- 
logs, such as the Sloan Digital Sky Survey (SDSS) and the 
two degree Field Survey, as well as in simulations. In addi- 
tion, design of future galaxy catalogs should be optimized 
in light of the expected errors for different alternatives. To 
demonstrate the practicality of our approach, the theory is 
illustrated throughout this paper by calculating the cosmic 
errors, cross-correlations, and biases for all relevant statis- 
tics related to count-in-cells in the future SDSS. It is worth 
to emphasize that our technique can be used to obtain the 
errors on the two-point correlation function with more ac- 
curate results than any previous method. 

The next Section describes the general theory of non- 
linear error propagation including the resulting bias and the 
calculation of (co)variances, with extension of the analysis 
to the weakly non- linear regime. Sect. 3 presents practical 
results for the SDSS survey: the expected errors, biases and 
cross-correlation of factorial moments and cumulants up to 
fourth order are given for a wide variety of clustering mod- 
els. Finally, Sect. 4 summarizes and discusses the results. 
In addition, Appendix A illustrates the theory with explicit 
formulae too cumbersome to be included in the main text. 
Appendix B compares in detail our predictions for the cos- 
mic bias on cumulants with the recent results of Hui & 
Gaztafiaga (1998, hereafter HG). 



2 THEORY 

In this section we present the theory of cosmic errors on the 
quantities of interest, cumulants (or connected moments) £ 
and Qn of the probability distribution function of the cosmic 
density. The central issue addressed here is the propagation 
of errors from the factorials moments Fk to the cumulants, 
the latter being nonlinear combinations of the former. For 
this sake in Sect. 2.1 we present the theory of error prop- 
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agation in a general setting for functions of correlated ran- 
dom variables. Sect. 2.2 applies this formalism to factorial 
moments and cumulants, taking advantage of the theory of 
cosmic errors on factorial moments by SC. Finally, Sect. 2.3 
discusses the specific models of clustering employed for nu- 
merical demonstration of the theory, including generaliza- 
tion of the original framework for PT. 



2.1 General Error Propagation and Bias 

Let us assume that f(x) is constructed from unbiased mea- 
surements of a set of random variables {xk} with known 
errors and cross-correlations^]. For measurements of a statis- 
tical quantity Xk, a different notation (such as Xk--) could 
be introduced for added precision. However, such notation is 
dispensed of since it would only clutter the formulae without 
adding anything of importance. If the measurements {xk} 
are sufficiently close to their ensemble average, {{xk}}, it is 
meaningful to expand / around the mean value 



Of 



1 d 2 f 



f( x )=f({ x )) + -^5x k + 

ax h 2 axkdxi 



5x k Sxi + ...0(5x 3 ),(l) 



where Sxt — Xk — (xk), and the Einstein convention was 
used. It is fruitful to evaluate the variance and bias of 
/, and the cross-correlation of two such functions f,g, up 
to second order precision. The resulting theory will be 
reasonably accurate as long as the variances and corre- 
lations of the underlying statistics are sufficiently small, 
i.e. (SxkSxi) I (xk) (xi) <C 1. Taking the ensemble average 
of the above equation yields the average of / in a finite sur- 
vey 



</> = /«*» + ~ 



i d 2 f 



{5x k &xi) + ...0(Sx 3 ). 



(2) 



2 dxkdxi 

According to this equation f(x) is a biased estimator of 
f({x)) (see also HG). More precisely, if x is an unbiased 
estimator, the (relative) bias on f(x) can be defined as 

</(*)>-/«*» 



/«*» 



(3) 



To second order, an unbiased estimator can be constructed 
from the formula. The bias is the result of the non-linear 
construction of / from unbiased measurements x. As the 
survey becomes larger the errors decrease, (SxkSxi) — * 0, 
and, in agreement with intuition, / becomes less and less 
biased. 

Similarly the covariance of two functions / and g can 
be evaluated, 

Cov(/,ff) = (SfSg) = (SxkSx^+OiSx 3 ), (4) 

where SX — X — (X). The variance of a function / is simply 
(A/) 2 = Cov(/, /), and the relative error 



(5) 



This is the general form of the widely quoted "error propa- 
gation" formula with correlated errors. 



* As long as errors on Xk are small they can follow any joint distri- 
bution. In particular they do not have to be Gaussian distributed. 



For a set of (possibly biased) statistics / = {fk}k-i,K, 
the covariance matrix is defined as dj = Cov(fk, fi), which 
is in turn crucial for maximum likelihood analyses. For ref- 
erence, the appropriate likelihood function in the Gaussian 
limit is (the logarithm of) 



T(/) 



1 



v /(2 7 r) Jf Det(C) 



exp 



1 



SfkG kl dfi 



(6) 



are the determinant and inverse of 



where Det(C) and C~ 
the covariance matrix. 

The range of applicability of the previous equations 
merits some comments. The most obvious condition is that 
the relative (co) variance (^), is 07 <C 1, otherwise the Tay- 
lor expansion diverges. From equations (^|), ^ and (^), the 
bias is of order bf = 0(a 2 ). Clearly, there is a meaningful 
regime 



b f < 07 < 1, 



(7) 



where the theory is certainly valid. In practice bf ~ 07 <C 
1 can happen, contradicting, however, the condition that 
bf ~ a 2 . This is a sign of cancellations in the coefficients, 
and in that case higher order expansions would be necessary 
to obtain the leading order results. 



2.2 Cosmic Errors and Cross-Correlations on 
Cumulants and Factorial Moments 

For the present applications of the above formulae, the aver- 
age count N, the variance f, and the cumulants Q3 and Q4 
are substituted for {/*,}. As shown below, each of these can 
be expressed in terms of the factorial moments Fk (iden- 
tified with Xk)- For further reference we first recall basic 
definitions, then we formulate the theory of errors of SC for 
factorial moments. 

The variance of count in cells is the average of the cor- 
relation function in a cell 

d 3 n d 3 r 2 



: C(ri,r 2 ) 



(8) 



The cumulants of higher order are geometrical averages of 
the iV-point correlations functions 

n - 1 ft 1 \ d3ri d3rN ta\ 
Qn = — i / iN(ri, . . . ,r N ) ... , (9) 



) r N) 

N N - 2 C 

and by definition Q\ = Q2 = 1. Another widespread nota- 
tion exists in the literature for Qn 

Sn = N N ~ 2 Qn, (10) 

where the N N ~ 2 factor corresponds to the number of trees 
that connect iV points. 

The connected moments are non-linear functions of the 
factorial moments (see Szapudi & Szalay 1993), e.g., 

N = Fi (11) 



4 F? 



Q3 = 

Qi = 
where 



Fi (Fi - 3FiF 2 + 2Ff) 

3(F 2 -F 2 )2 
Fi (Fa, - 4J3F1 - 3F 2 2 + 12F 2 Fl - GF?) 
16(F 2 ~F 2 f 



(12) 

(13) 
(14) 
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F k ee <(7V)0 ee (N(N - 1) . . . (N - k + 1)>. 



(15) 



Factorial moments are estimated in an unbiased fashion, bias 
affecting the cumulants is due to non-linear construction. 
Both the errors and biases of cumulants can be deduced from 
the errors and cross-correlations of the factorial moments, 
Cov(i ? ) c , Fi) = (SFkSFi), through the series expansions ^ 
and (^) if the variances are sufficiently small. 

The diagonal term, Cov(Fk, F k ), was evaluated by SC 
under the hierarchical and local Poisson behavior assump- 
tions. For the present generalizations i) the hierarchical as- 
sumption has to be discarded (see next subsection), ii) the 
k 7^ I cross terms need to be evaluated as well. The cross- 
correlations of the factorial moments are obtained through 
a completely analogous if cumbersome calculation as de- 
scribed in SC. The basic steps are outlined next. 

To evaluate the cross-correlations the full error gener- 
ating function of the factorial moments, which contains the 
measurement errors and the cosmic errors, should be ex- 
panded (SC), 



Cov(F k ,F) 



■ d - 


k 


' d ' 


.dx _ 




dy 



E c ' v (x + l,y + l) 



,(16) 



a;— y — 



E' 



c,v 



(x, y)=(l- 1) E°°' v (x, y) + E c >°°(x, y). 



(17) 



For completeness, the measurement errors are generated 

by ©, 



E°-°°{x, y ) 



P{xy) - P(x)P(y) 
C 



(18) 



where P(x) is the generating function of the distribution of 
counts in cells; F k = (d/dx) k P(x + l)\ x= o- The measure- 
ment errors can always be eliminated with large or infinite 
number of sampling c ells employed i n state of the art mea - 
surement algorithms (Szapudi 1998a; Szapudi et al. 1999d). 
Therefore the limit C — * oo is taken, i.e. the number of sam- 
pling cells tends to infinity, and measurement errors shall 
not be mentioned further. 

The surviving part of the generating function is 
E°°' v (x,y) = (P(x)P(y)) - (P(x)) (P(y)) with 



(P( x )P( y ))= / d D nd D r 2 P(x,y) 
V Jv 



+ 



(19) 



where D is the dimension of the survey, V is the volume 
covered by cells included in the catalog and P(x,y) is the 
generating function of bicounts for cells separated by a dis- 
tance |ri — r2|. Throughout the paper three-dimensional ge- 
ometry is assumed. The above equation yields both cosmic 
errors and cross-correlations. The calculation is facilitated 
by separating the double integral according to whether cells 
corresponding to coordinates overlap (o) or not (no). Details 
can be found in SC where k — I terms were evaluated. 

The contribution to the cosmic errors from disjoint cells 
corresponds to the finite volume errors, obtained from Tay- 
lor expanding the bivariate generating function of counts in 
cells, as shown below. 

The contribution from overlapping cells corresponds to 
the edge and discreteness effects. Its evaluation is somewhat 
tedious, involving a numerical integration after the expan- 
sion of the generating function. Nevertheless there are no 
further complications compared to the diagonal case of SC. 



The locally Poissonian assumption allows a major simplifi- 
cation of the calculation: only the monovariate generating 
function is integrated instead of the significantly more com- 
plicated trivariate function. 

2.3 Generating functions and models 

The original calculations of SC were based on a successful 
model for the highly non-linear regime, the hierarchical tree 
assumption. This assumption has never been fully demon- 
strated although some hints for it has been given recently 
(e.g., Scoccimarro & Frieman 1998). Since the coherent in- 
fall on large scales introduces an angle dependence in the 
perturbation theory kernels (e.g., Goroff et al. 1986), this 
approximation breaks down in the weakly non-linear regime. 
This necessitated a generalization of the previously used 
assumptions for this article. The resulting new generating 
functions accommodate most models currently used, such 
as the Ansatz by Szapudi & Szalay (1993), denoted by SS 
and the one by Bernardeau & Schaeffer (1992), denoted by 
BeS, perturbation theory (PT), and extended perturbation 
theory (Colombi et al. 1997), hereafter EPT. 

The other simplifying assumption of SC, the local Pois- 
sonian Ansatz, is kept for the present calculations. To elim- 
inate it would require major modification in the numerical 
method, due to the trivariate generating function. Fortu- 
nately all indications point to the extreme accuracy of this 
assumption for error calculations, although for cross correla- 
tions it becomes increasingl y questionable as the difference 
of orders, \k — l\, increases ([Colombi et al. 1999b). 



Since the models and the method of calculation are de- 
scribed by SC in sufficient detail, only the new features aris- 
ing from the present general setting are pointed out next. 

As described in § 2.2, the calculation of errors requires 
the knowledge of the monovariate and the bivariate gener- 
ating functions for the counts. 

The monovariate generating function remains formally 
unchanged compared to SC, since the original form (White 
1979; Balian & Schaeffer 1989; Szapudi & Szalay 1993) is 
completely general, 



P(x) = exp I J2 ( x ~ if^NQr 
I N=l 

with 

N N ~ 2 —N -N-l 



(20) 



(21) 



However, various assumptions about £ and Qn are different 
for each model. These can be obtained either from mea- 
surements or phenomenology in the case of the SS and BeS 
models, or from the form of the primordial power spectrum 
for PT. PT has specific rules to relate Qjv to the local 
derivatives of the power spectrum (Juszkiewicz, Bouchet & 
Colombi 1993; Bernardeau 1994a, b), e.g., 

Qs ~ 21 

_ 7589 31(n + 3) 7(n + 3) 2 



24 



(22) 
(23) 



2646 

with n = —3 — dlog£/dlog^. From here on higher order 
derivatives jj = d J £/(d log £) J are neglected in the calcula- 
tion of Qn, N > 4 (Bernardeau 1994b). This is an accurate 
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approximation and simplifies the calculations (e.g., Colombi 
et al. 1999b). 

The general form of the bivariate generating function 
is (Schaeffer 1985; Bernardeau & Schaeffer 1992; Szapudi & 
Szalay 1993) 

P(x,y)=P(x)P(y)exp[R(x,y)], (24) 
where R(x,y) contains the cumulants connecting two cells, 

oo 

R(x,y)=£ {x-l) M (y-l) N QNMr M r N NM. (25) 

JVf=l,JV=l 

The coefficients Qnm, the cumulant correlators, are defined 
similarly to the Qjv's, 



Qnm 



1 



d 3 n 



126) 



This is an integral of the TV + M-point correlation function 
over two separate cells. The normalization corresponds to 
the number of possible trees in each cell multiplied with 
possible non-loop connections between the cells multiplied 
with the appropriate power of the average correlation func- 
tion. Thus the Qnm's become unity when the underlying 
tree graphs of the higher order co r relation functions are a ll 
given unit weights (Schaeffer 1985; szapudi & Szalay 1997). 
Note the alternativ e notation Cnm = Qnm 
(Bernardeau 1996a). 



When the cell separation is much larger than the cell 
radius it is n atural to expand the genera t ing function in 



terms of £/£ (BeS; Szapudi fc Szalay 1993; 3C; Szapudi & 



Szalay L997). As a consequence exp(R[x,y 
thus 



P(x, y) ~ P(x)P(y) [1 + R(x, y)] + 0(£ 2 /£ 2 ) 



1 + R{x,y), 



(27) 



The above was found to be extremely accurate in practice, 
even for touching cells. 

Phenomenological theories of the bivariate counts at- 
tempt to relate the cumulant correlators, Qnm, to the cu- 
mulants, Qn- The leading assumptions, used for the numer- 
ical explorations of Sect. 3, are reviewed next. 

The SS approximation is purely phenomenological. It 
assumes that 



Qnm - Qn+m- 



For example 



Vl2 



Qif — Q22 — Qi- 



(29) 



The BeS model postulates a factorization property for 
the joint counts in cells, Pnm- From equation (]27|), 



Pnm ~ P n Pm{1 + b NM ■ 



(30) 



In addition the BeS model imposes^ that bNM = &jv6a/, 
implying 



^BeS _ ^BcS ^BcS 
WNM — WN1 VM1 • 



(31) 



t This is also suggested by recent numerical results obtained by 
Munshi, Coles & Melott (1999b) in 2D dynamics. 



This is true in a minimal tree construction providing specific 
relationships between Qjv's and Qjvi's (see Bernardeau & 
Schaeffer 1992 and Bernardeau & Schaeffer 1999 for a more 
detailed discussion of this model). For instance, 

<3i2 cS = Q3, 



^.BcS 
V13 



-.BeS 
/22 



j04 



(32) 



Interestingly, the SS and BeS models are identical when 
Qn = 1 for all TV. Since in practice, the Qjv's depart from 
unity only weakly, the difference between the two models 
is usually insignificant, despite the formal dissimilarity be- 
tween them. 

When calculations are done in PT framewo rk the prop- 
erties ( fjol ) and (^) are also naturally obtained ( Bernardeau 



1996a) 



-,PT _ /-,PT n PT 
!NM — WNl VH1 ■ 



(33) 



The evaluation of the lowest non-trivial orders yields (Fry 
1984; Bernardeau 1996), 



V12 



34 
21 



„ PT 11710 61(n + 3) 2(n + 3) 2 

Q13 



^PT 

/22 



3969 
1156 
441 



63 



27 



34(ti + 3) (n + 3) 2 



63 



36 



(34) 



where 72 = d 2 £/(d log I) 2 term in the second equation above 
is neglected as previously. 

Note that, in the weakly nonlinear regime where the 
Qjv's are given by equations (^2]) and (^jf), SS and BeS 
models give factors Qnm of same order as the correct re- 
sult (|34|). In fact, the BeS model agrees exactly with PT for 
n — —3. 

PT as a model can be extended throughout the non- 
linear regime as well. In the resulting theory, EPT (Colombi 
et al. 1997)j_the form of the Qjv's is still taken from PT; e.g., 
equation (E2) can be extended into the non-linear regime. 
Then n, formerly the slope of the power spectrum, becomes 
a formal fitting parameter, denoted with n e g . It was found 
empirically in simulations and galaxy data that all higher 
order Q n c an be described fairly accurately with a single n c g 



parameter (Colombi et al. 1997; Szapudi, Meiksin & Nicho] 



199C; Szapudi et al. 1999d). This idea can be generalized 
to the bivariate distribution in several ways, as proposed 
by Szapudi & Szalay (1997). The version used in this work, 
denoted by E 2 PT, consists of taking the same n e ff for the 
Qnm's in equations ( |34| ) as for the Qjv's. 

The new assumptions for the generating function are 
sufficiently general to incorporate most conceivable models, 
notably perturbation theory and its variants. Fortunately 
the changes do not incur many complications for the error 
calculations compared to that of SC. The overlapping part 
of the integral in equation ( p"9| ) depends on the unchanged 
monovariate distribution. This calculation, the most com- 
plicated and CPU consuming component of the technique, 
was performed by SC. Here only the appropriate values of 
^ and the Qjv's had to be substituted into the analytic re- 
sults. The missing cross-correlations of the overlapping part 
were computed in an exactly analogous fashion as previ- 
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Table 1. The standard CDM model used by CSS (CDM1) 
and the four CDM variants proposed by the Virgo Consor- 
tium (CDM2,3,4,5). The notations are the same as in Jenkins 
ct al. (1998). 



Model 


n 


A 


h 


r 


CT8 


CDM1 


1.0 


0.0 


0.5 


0.50 


1.00 


CDM2 


0.3 


0.0 


0.7 


0.21 


0.85 


CDM3 


0.3 


0.7 


0.7 


0.21 


0.90 


CDM4 


1.0 


0.0 


0.5 


0.21 


0.51 


CDM5 


1.0 


0.0 


0.5 


0.50 


0.51 



ously. This somewhat cumbersome task was carried out by 
the Mathematica computer algebra package. 

The bivariate generating function induces the non- 
overlapping part of the integral constituting the error gen- 
crating function, i.e. the finite volume effects. The com- 
putation consists of expanding equation a simple al- 
beit tedious analytical computation performed again with 
Mathematica. 

Up to the locally Poissonian assumption and the expan- 
sion of the bivariate generating function to linear order in 
£ (an excellent approximation even for touching cells), the 
results are completely general, and can be used easily if new 
interesting models surface. 

Explicit analytic expressions for the cosmic errors and 
cross-correlations are given in Appendix A for factorial mo- 
ments, up to third order. 



3 APPLICATION: SDSS-LIKE SURVEYS 

The results were applied to calculate the expected errors, 
cross-correlations, and biases for SDSS-like galaxy catalogs 
as defined in detail in CSS. The SDSS is a magnitude limited 
galaxy survey where the average number density of galax- 
ies decreases with distance from the observer. To investigate 
a reasonable range of underlying clustering properties, the 
shape and normalization of the two-point correlation func- 
tion, thus £ and £(L) (see introduction and Appendix A) 
were taken from the standard CDM model of CSS (hereafter 
CDM1) as w ell as four CDM var iants proposed by the Virgo 
Consortium (Jenkins et al. 199£ ) (hereafter CDM2,3,4,5, as 
described in Table 1). CDM1 is used as default, except when 
otherwise indicated. The SS and BeS models depend on 
the higher order cumulants Qn thus EPT could be used 
with n = —2.5. This agrees approxi mately with the mea- 
surements in the APM and EDSGC (|Gaztanaga 1994 " 



SZB" 



udi et al. 1995|; pzapudi, Mciksin fc Nichol 199^ ; |Szapudi 



Gaztanaga 199£). The same spectral index was used as 
default for E PT, as well as the indices n = — 1 and n = — 9 
for reasonable alternatives of higher order clustering, espe- 
cially in the highly non-linear regi me. The most successfu l 
model of all for error calculations ( |Colombi et al. 1999b[ ), 
E 2 PT was used as a default unless otherwise noted. 

For the sake of conciseness, the technical information 
on figures is contained in the captions only and the physical 
results are explained in the main text with the least possible 
overlap. The more conventional procedure of duplicating in- 
formation in the main text would have rendered the paper 
unnecessarily long and cumbersome due to the exception- 



ally large number of figures and the multitude of line-types, 
panels, etc. contained in them. 



3.1 Cosmic Errors and Bias 

Figure 1 shows the expected errors on the factorial moments 
in SDSS-like surveys for various models and contributions. 
The estimator for the factorial moments proposed by CSS 
is assumed, 



Ft 



c Z-~> 



{Nj) k uj e ,k(rj) 
[Mn)] k ' 



(35) 



where C is the (very large) number of sampling cells thrown 
at positions n, <j>i(ri) is the selection function, and the 
weight w^fc is determined to minimize the variance of the 
estimator. As shown by CSS, the weights can be optimized 
by numerically solving an integro-differential equation, while 
the approximate solution is u oc 1/cr 2 , with a represent- 
ing the full errors of the given statistic. The above opti- 
mal weight is assumed for most curves. (See the figure cap- 
tion for details). In general, i) the different models SS, BeS, 
and E 2 PT yield almost same results, ii) the dependence on 
the two point function causes a spread reaching a factor 
of 5 on certain scales almost independently of order, iii) 
different reasonable assumptions for the underlying Qjv's 
generate significant spread which, depending both on order 
and scale, can reach up to an order of magnitude. The as- 
sumptions for the Qn's, however, allowed a quite generous 
variation taking into account the typical difference between 
weakly non-linear and highly non-linear regime in CDM- 
type simulations. Uniform weighting scheme boosts the er- 
rors on small scales considerably compared to the optimal 
weights introduced by CSS except for f\ where there is 
no significant difference. In most of the relevant dynamic 
range, l/i -1 Mpc < I < 50/t _1 Mpc, edge effects are domi- 
nating the errors. For any realistic survey, the geometry is 
expected to be more complex because of the cut out holes 
caused by bright stars, cosmic rays, etc. This could signif- 
icantly boost the edge effects compared to the calculations 
presented here. Discreteness effects are important for very 
small scales £ < l/t _1 Mpc and uniform weights only. Opti- 
mal weights render discreteness and finite volume effects on 
a par in this regime. 

Figure 2 is analogous to Figure 1 for the connected mo- 
ments. In contrast with the factorial moments, i) finite vol- 
ume error is completely negligible compared to the other 
contributions, and for orders N > 2 it is strongly dependent 
on the models, SS 3> BeS 2> E 2 PT, ii) the dependence on 
the two point correlations is less pronounced, iii) the depen- 
dence on higher order clustering appears to be less sensitive 
to order. 

Figure 3 recapitulates the results of Figs. 1-2 by compar- 
ing the errors on measurements of factorial moments with 
connected moments. For small scales £ Ss 7 — 10ft _1 Mpc 
the cumulants fare much better than factorial moments; one 
reason is the suppression of finite volume effects. Note espe- 
cially the large difference between Qs, and F3. Interestingly, 
Qz has small errors, within a factor of two A£/£, and there 
is a range in which AQ3/Q3 Si AF2/F2. The edge effects for 
the cumulants are greatly boosted on large scales compared 
to the factorial moments. However, this has to be interpreted 



© 0000 RAS, MNRAS 000, 000-000 



Cosmic Statistics of Statistics 7 



0.1 



10 F 



10 , 



<1 



0.1 1 10 

I (h _1 Mpc) 




0.1 1 10 

I (h' 1 Mpc) 



0.1 1 10 

I (h _1 Mpc) 




0.1 1 10 

^ (h" 1 Mpc) 



> o.i 




< 




0.1 1 10 

I (h _1 Mpc) 




0.01 



0.1 l 10 

I (hT 1 Mpc) 



< 




/ (h _1 Mpc) 



1 10 
/ (h _1 Mpc) 



0.1 1 10 

<! (IT 1 Mpc) 




0.1 1 10 

/ (h _1 Mpc) 



o.oi 




0.1 1 10 

I (h _1 Mpc) 




1 10 
^ (h _1 Mpc) 



Figure 1. Prediction of the cosmic error on factorial moments, up k = AF^/Fk, for k < 4. The first column shows the cosmic error 
from disjoint cells for different models SS (long-dashes), BeS (dot-dashes), E 2 PT (long dashes with dots), and also from overlapping 
cells (das hes). The indistinguishable solid curves display the total error for each model. All the above assumes optimal radial sampling 
weight w (CSS), while the dotted line was computed with uniform weight for comparison. The second and third column demonstrate the 



robustness of the results with respect to variation of the two-point correlation function in the different CDM models (respectively solid, 
dots, dashes, long dashes and dot-dashes for CDM1,2,3,4,5), and the choice of the spectral index for E 2 PT (solid for n = —2.5, dots for 
n = —9 and dashes for n = —1), respectively. In the first column n = —2.5 and CDM1 was used. Second column has n = —2.5 with 
E 2 PT, the third column has CDM1 with E 2 PT. Note that for the first and third columns the errors for Fi are independent of higher 
order statistics, therefore the different models superpose. 



cautiously since those scales are close to the limit of appli- 
cability of the theory according to equation (Q). 

Figure 4 compares the magnitude of the cosmic bias to 
the cosmic error for cumulants. As expected from theoretical 
prejudice, the cosmic bias is by orders of magnitude smaller 
than the cosmic error in the regime where the perturbative 
approach is applicable, i.e. £ <, 10/i -1 Mpc for the SDSS. On 
larger scales the bias calculation apparently becomes unsta- 



ble. Thus Figure 4 re-confirms the correctness of equation 
([?]) as a guidance for the validity of the theory. 



3.2 Cosmic Cross-Correlations 

Figure 5 displays the cross-correlation coefficients 
(5F k 5Fi) 



Shi — 



AFkAFi 



(36) 



© 0000 RAS, MNRAS 000, 000-000 



8 /. Szapudi, S. Colombi and F. Bernardeau 





1 10 

I (IT 1 Mpc) 







ill 






/Iff 






!/, n 
lit ' 


L\ 




/if 

Jr ; 


:\\ 














1 10 
I (IT 1 Mpc) 



1 10 
I (IT 1 Mpc) 




1 10 

I (IT 1 Mpc) 




1 10 

I (h _1 Mpc) 



Figure 2. Same as Fig. 1 for connected moments, i.e. for £ and 5jv = QnN . The curves are only plotted when the expansion in 
equation (M) yields positive results for the cosmic error. 
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I (h _1 Mpc) 



Figure 3. Comparison of the cosmic errors for the factorial 
and connected moments. CDM1 was assumed for the two-point 
correlation function and E 2 PT with n = —2.5 for higher order 
statistics. Solid, dotted, dash, and long dash lines correspond to 
orders 1 through 4, respectively. Of each pair of curves with the 
same line-types the one turning up on large scales relates to the 
cumulant. The right stopping point of the long dash curve for 
54 = 16 Qi was determined similarly to Figure 2. 



for factorial moments under various circumstances. In this 
equation the denominator always contains the full cosmic 
error even when only certain contributions are examined for 
the cross-correlations; this ensures additivity. For most cal- 
culations homogeneous weights were used. The correlations 
increase from small scales I ^ lh" 1 Mpc to an approximate 
plateau. The finite volume contribution exhibits a unimodal 
behavior with a peak on small scales, while edge effects rise 
on large scales. The shape of the finite volume part is mainly 
due to the division by the full cosmic error in the previ- 
ous equation: on small scales discreteness, on large scales 
edge effects cause suppression. The same argument applies 
to the drop of the full coefficient on small scales: discreteness 
(therefore dilution) boosts the cosmic errors, thus reduces 
Ski ■ Note also that the relative contribution of the finite vol- 
ume effect is decreasing with order as already found for the 
cosmic error. 

In addition to the previous comments, the following ob- 
servations can be made from Fig. 5: i) similarly to the cosmic 
errors, the different models SS, BeS, and E 2 PT yield almost 
exactly the same cross-correlations, ii) optimal weighting 
naturally increases the cross-correlation, especially on small 
scales and when the weights are selected to be optimal for 
the higher order of the two statistics, iii) the effects of the 
choice of the two-point correlation function are considerable, 
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Figure 5. Cosmic cross-correlation coefficients of the factorial moments. The individual columns correspond to cross terms 8^1 for pairs 
of indices (k, I) = (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), respectively. Homogeneous weights were used for all panels, except for the second 
row. Except for row five, n = —2.5 is assumed for higher order statistics. Except for the first row, E 2 PT is used. Finally, except for the 
fourth row, CDM1 is the underlying cosmology. 

The first row of panels compares various contributions within the framework of SS (long-dashes), BeS (dot-dashes), E 2 PT (long dashes 
with dots). The difference between the three models is negligible. The resulting three groups of curves in increasing order at i = 8/i _1 Mpc 
correspond to the finite volume, overlapping (i.e. discreteness+edge effects), and the total contributions, respectively. Note that the full 
cosmic error was used in the denominator for each curve to preserve additivity. This explains the residual dependence of the overlapping 
contributions on the model. 

The second row is analogous to the first one but examines the dependence on the optimal weights. The uniform weights (solid) are 
compared to the optimal weights for orders k (dots) and I (dashes), where k < I. 

The third row illustrates dilution effects. The full sampling is shown by solid lines while the effects of 10 times dilution are displayed by 
dots. The curves in increasing order at i = 8/1 -1 Mpc again correspond to the finite volume, overlapping (i.e. discreteness+edge effects), 
and the total contributions, respectively. 

The fourth row displays how total contributions are affected by the choice of the two-point correlation functions in different variants 
CDM1 through CDM5 (in the same order, solid, dots, dashes, long dashes, dot-dashes), respectively. 

The fifth row shows the changes on the cross-correlations due to varying the higher order statistics via changing the spectral index in 
the framework of E 2 PT, n = — 1 (dashes) n = —2.5 (solid), and n = —9 (dots). 
© 0000 RAS, MNRAS 000, 000-000 



10 /. Szapudi, S. Colombi and F. Bernardeau 




1 10 
t (IT 1 Mpc) 



l 10 
I (h _1 Mpc) 

Figure 5: Continued. 
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while the results are robust against variations of higher order 
statistics. 

Figure 6 displays the correlation coefficients Ski for the 
cumulants. The figure is exactly analogous to Figure 5. Sim- 
ilar conclusions can be drawn as previously; we only point 
out the differences: i) the perturbative nature of our method 
limits the domain of applicability of the results, ii) finite vol- 
ume contributions are appreciably weaker than for factorial 
moments, as already established for the cosmic errors, iii) 
the dependence on the underlying clustering is complicated 
to interpret because of the different ratio natures of the vari- 
ous statistics involved; this is explained in more detail below. 



Figure 7 illustrates the principal results for cross- 
correlations in the SDSS. The factorial moments are always 
positively correlated. The correlations depend on the differ- 
ence of orders \k — l\, the larger the difference the smaller 
the correlation coefficient, in agreement with intuition. It is 
worth noticing that the correlations exhibit approximately 
the same magnitude and scale dependence for the same value 
of \k — l\, i.e. increase from small scales i <, \hr x Mpc to a 
plateau at larger scales. On small scales the correlations are 
diluted by discreteness. 

The behavior of the cross-correlations for the cumulants 
is more difficult to interpret. There are three classes of cu- 
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Figure 6. Same as Figure 5, with the orders TV = 1,2, 3,4 corresponding to the average count F\ = TV, £, S3 = 3Q3, and 54 = I6Q4, 
respectively. There are some differences, however, which are listed next. The range of the y axis is changed to [—1, 1]. The sequence of 
the various contributions in the three upper rows is different from that of Figure 5, expect for the first column. (In the third row of this 
column the cross-correlations are approximately zero for the diluted case, and the order is slightly different but unimportant). The rest 
of the columns in the three upper rows have approximately zero finite volume contributions to the correlation coefficient. Thus the finite 
volume effect is easily identifiable as a straight line, while the other curves all superpose and they correspond to the overlapping and 
total contributions. 

The right end point of the curves is chosen according to equation (|?|), replacing "<C" with "<". This condition is not exact, the sharp 
downturn on many panels suggests that a realistic limit is around lO/i" 1 Mpc. 



mulants: TV (order 1), £ (order 2), and Qn (order TV)[], each 
with slightly different normalization for historical and prac- 
tical reasons: £ scales with TV , and the Qn's likewise with 



t Thus the first two classes have only one member each. 



£ ' '. Thus one has to interpret separately the correla- 
tions between TV and £;, TV and Qjv's, £ and Qjv's, and finally 
between the Qjv's themselves. The latter are the simplest to 
understand: they have similar positive correlations to the 
factorial moments, as expected. The rest of the correlations 
are fairly weak, in agreement with intuition when the dif- 
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Figure 6: Continued. 
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ference of orders \k — l\ is large. The correlations for N and 
£, and for £ and Q$ are smaller than for factorial moments 
of the same order. This is due to the ratio nature of £ and 
Q3 which suppresses the correlations somewhat. As men- 
tioned earlier, the perturbative nature of our method limits 
the validity of our results above IO/2. -1 Mpc for the SDSS- 
like surveys. Also, there are same small negative correlations 
which should not be over-interpreted. At the present level 
of accuracy only the weakness of correlations can be estab- 
lished. 



4 SUMMARY AND DISCUSSION 

This article formulated the theory of errors on quantities 
related to counts in cells, focusing especially on cumu- 
lants and factorial moments. A universal, analytic method 
based on Taylor expansion approach was devised to calcu- 
late explicitly the cosmic error, the cosmic bias, and the 
cosmic cross-correlations for virtually any statistics derived 
from counts in ce lls. There are always three contributions to 
these quantities ( |SC| ): finite volume, edge, and discreteness 
effects. The principal results are the following: 

(i) Cosmic errors: SC have computed the cosmic errors 
on factorial moments for two particular cases of the hier- 
archical model. CSS have extended the results for inhomo- 
geneous catalogs and for optimal weighting. These previous 
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Figure 4. The comparison of the cosmic bias and the cosmic 
error for the cumulants. For all curves CDM1 and E 2 PT with 
n = —2.5 were used. Line-types correspond to £ (solid) , 53 = 3Q3 
(dotted) , and S4 = I6Q4 (dashed), respectively. The three lower 
curves show the absolute value of the cosmic bias, while the three 
upper ones correspond to the cosmic error. The end point of the 
curve for AS4/S4 was determined as previously (Fig. 2). For the 
cosmic bias bQ N there is some irregularity above ~ 10h~ 1 Mpc. 
At this point the validity of our theory is probably exceeded, 
and the results become unstable. In the regime where the theory 
is applicable the cosmic bias is always negative for the SDSS 
catalog. 



calculations have been generalized for cumulants, and for 
PT; explicit analytic results for the factorial moments are 
given in Appendix A. The cosmic error depends on the bi- 
variate distributions, for which EPT had to be generalized. 
The new Ansatz is termed E 2 PT, and explained in detail in 
Colombi et al. (1999b). For the SDSS it is predicted that the 
cumulants fare better than the factorial moments on scales 
£ Si lOft -1 Mpc. On large scales the situation is reversed due 
to the enhanced sensitivity of the connected moments to 
edge effects. For the particular example of the SDSS, how- 
ever, this regime is outside the validity of our perturbative 
method. In the scale range of 1ft" 1 Mpc - 10ft -1 Mpc the 
expected errors are smaller than 3 % for £, 4 % for Q3, 
and 15 % for Q4. For reference, the errors determined by 
CSS for factorial moments of order k = 2, 3, and 4 were 
1 — 2%, 3 — 5%, and, 5 — 10%, respectively, in the regime 
lft -1 Mpc 55 I 50ft -1 Mpc. A detailed investigation in a 
range of reasonable models shows that the estimates are ro- 
bust within a factor of ~ 2. 

Note that according to equation (8) £ is a linear func- 
tional of £, the two point correlation function. In fact, if £ 
is a power-law of index 7 the two are proportional to each 
other ^ oc £. For a linear functional, the error propagation is 
expected to be especially simple: the errors on £ should be a 
linear function of the errors on £. Of course, this statement is 
only approximate, because its validity depends on the nature 
of the estimators used to measure £ and £. For a power-law 
correlation function, we conjecture that the approximation 
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Figure 7. Summary of the cross-correlation results. The facto- 
rial moments (upper panel) and cumulants (lower panel) are dis- 
played assuming CDM1, E 2 PT with n = -2.5. The orders (k,l) 
are distinguished by different line-types. (1,2): solid, (1, 2):dots, 
(1, 4):dashes, (2, 3):long dashes, (2, 4):dots-dashes, (3,4):dots- 
long dashes. The curves for the lower panel are displayed when 
equation (pj) is valid (replacing "<C" with "<"). 



be at least partly corrected for estimators of £ (e.g., Ripley 
1988; Landy & Szalay 1993; Szapudi & Szalay 1998) but not 
for standard estimators of § (e.g., CSS). In that regime, it 
is therefore expected that ^ aj. Nevertheless, approx- 
imation ( pTj ) should be more accurate for estimating the 
errors on the two point correlation function than the meth- 
ods prevailing in the literature, especially the meaningless 
bootstrap method. 

(ii) Cosmic bias: an estimator is biased if its ensemble 
average is different from the true value. This is typical when 
non-linear functions of unbiased estimators are constructed, 
such as £, and the Qjv's. For such statistics a perturbative 
expansion can be used to determine the bias b. A simple but 
important consequence is that b = 0(a 2 ), where a is the 
relative cosmic error. As a result the cosmic bias is negligible 
compared to the cosmic error in the perturbative regime. A 



pei 

necessary, and in practice sufficient ( |Colombi et al. 1999b ) 



holds for the relative cosmic error. There might be some dif- 
ference at large scales, where edge effect dominate and can 



criterion for the validity of the series expansion is that b 
a <C 1. For the SDSS the cosmic bias is predicted to be 
negligible on scales ^ 50ft -1 Mpc for £, and ^ 10ft -1 Mpc 
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for higher order statistics. Explicit formulae are given for b-r 
and 6q 3 in Appendix B. 

(iii) Cosmic cross-correlations: they generalize the con- 
cept of the cosmic error by considering the full correlation 
matrix of the statistics. Correlations between indicators in- 
fluence the constraining power of measurements on theories. 
The calculation for the cross-correlations of the factorial mo- 
ments is exactly analogous to that of the cosmic error pre- 
sented by SC. Explicit analytic results are given in Appendix 
A. Together with the results of SC this completes the the- 
ory of the full cosmic cross-correlation matrix and forms the 
basis of subsequent calculations concerning the errors of any 
quantity related to counts in cells, such as the cumulants. 

While the following results were established in a concrete 
example, i.e. a suit of SDSS like surveys, we conjecture that 
they are quite generic. In agreement with intuition factorial 
moments of close orders appear to exhibit stronger correla- 
tions than those of far orders. The results are more complex 
for cumulants, although the Qjv's behave similarly to facto- 
rial moments. Interestingly, the correlations between £ and 
N, and between £ and Qjv's are weaker than for factorial 
moments of the same order. Optimal weighting naturally 
augments correlations, and discreteness effects likewise re- 
duce them. These results depend significantly on the clus- 
tering properties of the underlying distribution of galaxies, 
although the qualitative features are robust. 

The theoretical calculations of this paper were con- 
fronted wit h measurements in a s tate of the art large rCDM 
simulation (Colombi et al. 1999b); the results are previewed 
next. 

The detailed investigations suggest that the theory of 
errors presented in this article is fairly accurate, especially 
in the weakly non-linear regime, where a few percent preci- 
sion was achieved for the factorial moments. In the highly 
non-linear regime it appears that the approximate nature of 
the models for bivariate distribution translates into a slight 
overestimation of the errors, perhaps by a factor of two in 
the worst case. The situation will be improved in the future, 
if more realistic models are constructed for the bivariate 
counts. 

The predicted cross-correlations for the factorial mo- 
ments describe the qualitative features of the measurements 
quite well, however, the details are less precise than for the 
errors. When the difference of orders \k — l\ = 1, the theory 
is about 20% accurate, while it gradually looses precision, 
up to about 50% in the worst case, as the difference of or- 
ders increases. This behavior suggests that the underlying 
locally Poisson assumption becomes less precise. An attempt 
to improve on this would introduce encumbering complica- 
tions because of the necessity of the trivariate generating 
function, and is left for future research. 

The present results complement the investigations of 
SC, and their generalization by CSS for inhomogeneous cat- 
alogs. Together they constitute the statistically complete de- 
scription of the errors whenever the Gaussian approximation 
for the cosmic distribution of events is sufficientl y accurate. 
This is tr ue when the cosmic errors are small ( [Szapudi et 
al. 1999c), an essential result for likelihood analyses. Ap- 



While the investigations presented in this article are 
sufficiently accurate for any foreseeable practical application 
and included all crucial effects and contributions, there are 
some minor points which were not mentioned thus far: 



(i) Galaxy bias (not to be confused with the cosmic bias) : 
light might not trace mass, thus the statistical properties of 
galaxies might be different from those of the dark matter. 
Theories and models relying only on dark matter dynamics 
such as PT and EPT might miss some important aspects of 
the galaxy distribution. However, current measurements in 
two and three dimensional galaxy catalogs suggest that the 
models used here such as SS, BeS, and even EPT, yield fairly 
realistic description (e.g., Gaztanaga 1994; Szapudi, Meiksin 
& Nichol 1996). To be complete, however, one should in 
principle include the effects of bias in the theory. 

(ii) Redshift distortions: they arise from the peculiar ve- 
locities of galaxies in three dimensional catalogs. Their ef- 
fect on the statistics is well known. The two-point correla- 
tion function and the amplitude of the Qjv's decreases in 
the highly non-linear regime, while in the weakly non-linear 
regime only the normalization of the two-point correlation 
function is affected significantly (e.g., Matsubara & Suto 
1994; Hivon et al. 1995; Szapudi et al. 1999b). The extent to 
which redshift distortions alter clustering is thus well within 
the range of variations considered previously. 

(iii) Cosmological parameters : the dependence of the Qn 
coefficients on cosmological parameter s is extremely weak . 
This has been exp l icitly shown in PT (Bouchet et al. 1992; 
Bernardeau 1994a; Hivon et al. 1995), and it is expected to 
carry over to the nonlinear regime as well (Nusser & Colberg 



1998; Scoccimarro & Frieman 1998; Szapudi at al. 1999b). 

(iv) Angular catalogs and weak lensing: this article con- 
sidered three dimensional distributions only. Analogous cal- 
culations can be done for angular catalogs, and for weak 
lensing which promises to be an important mean of in- 
vestig ation of the cosmologica l parameters in the near fu- 
ture ( Bernardeau et al. 1997 ;_ Jain, Seljak & White 199!: 



Gaztanaga fc Bernardeau 1998 ). This point is in vestigated 



plications of the theory of cross-correlations are discussed 
elsewhere ( Szapudi, Colo mbi fc Bernardeau 1999a ; Bouchet 



Colombi fc Szapudi 1999[) 



elsewhere ( Bernardeau, Colombi fc Szapudi 1999 ). 

(v) Edge effects: so far the calculations were performed to 
leading order in v/V, and the results are independent of the 
geometry of the catalog. This is sufficiently precise approx- 
imation for compact surveys such as the SDSS. However, 
for more complicated survey geometries, such as the 2dF or 
the VIRMOS survey, the computations can be improved by 
taking into account higher order terms. The next to leading 
order term depends on the perimeter (surface) of the survey. 
This is studied in Colombi et al. (1999). 

(vi) Full description: for maximum likelihood analyses 
with multi-scale measurements there is one more step needed 
to complete the statistical description. The cross-correlation 
matrix should be calculated between statistics estimated on 
different scales. This calculation is a trivial although some- 
what tedious generalization of the previous considerations. 
It is left for future work. 

(vii) Cosmic bias: these results are in contrast with that 
of Hui & Gaztanaga (1998, HG). The reasons for the differ- 
ence are that a) they neglected discreteness effects, which 
could be significant on small scales for cumulants Qn, b) 
although their calculation in principle includes edge effects 
dominant on large scales, they finally neglected them (how- 
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ever, see the discussion in Appendix B), c) they did not 
realize that b = 0(a 2 ) in the perturbative regime. Outside 
of the domain of validity, this condition naturally breaks 
as the measurement of HG suggests. However, to estimate 
the cosmic bias and the cosmic error they use only 10 real- 
izations of the local universe. In the Virgo Hubble Volume 
simulation with 4096 realizations, Colombi et al. (1999b) 
find that the cosmic bias is always dominated by the cos- 
mic errors. Moreover, according to Szapudi et al. (1999c), 
the cosmic distribution function, the probability distribu- 
tion function of measurements, shows significant skewness. 
This is a source of effective bias for only one realization, i.e. 
our local universe; see Colombi et al. (1999b) and Szapudi 
et al. (1999c) for a detailed discussion. HG have proposed 
an Ansatz for scales beyond the validity of Taylor expan- 
sion in the theory. This recipe, however, neglects edge ef- 
fects, which constitute the dominant contribution on large 
scales, except for £ (see Appendix B); the apparent agree- 
ment of their Ansatz with measurements appears to be a 
coincidence. Nevertheless, their calculations, if sufficiently 
tested and gauged with iV-body experiments, may be still 
used to estimate the cosmic bias. A detailed comparison of 
their analytic results for £ with ours is contained in the Ap- 
pendix B. 
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APPENDIX A: THE COSMIC ERROR AND CROSS-CORRELATIONS FOR FACTORIAL MOMENTS 

This section complements the analytic results for the cosmic errors obtained in SC with explicit formulae for the cross- 
correlations. These together with the previous results establish the full cosmic cross-correlation matrix, which underlies all 
error calculations for statistics related to counts in cells. 

For the sake of conciseness and simplicity the following notation is introduced for the cosmic cross-correlation matrix 

A fc; = Cov{F k ,Fi) = (SF k 5Fi). (Al) 

Note that Afcfc = (AFk) 2 is the cosmic error. Aki has three contributions 

a«=a£ + a£ + a£, (A2) 

where Af ; , Af t and Aj?; are the finite volume, edge and discreteness effect contributions, respectively. SC computed Af fc , 
k < 4, and presented the analytic results for k < 3, within the framework of the SS and the BeS models. Assuming local 
Poissonian behavior, and a power-law r -7 for £(r) on scales r < 21, they also calculated the discreteness and edge effect 
contributions, A° fc and Af fc for k < 4 with explicit formulation for k < 3. All computations were performed to leading 
order in v/V , where v and V are the cell and the sample volume, respectively. The aim of this Appendix is to present the 
extension of their results for PT (and EPT) for the finite volume contribution (Appendix A.l), and for cross-correlations 
k < I < 3 (Appendix A. 2). Note that, as in SC, all the calculations were performed up to fourth, but the results are only 
printed to third order. A FORTRAN program can be obtained from the authors for computing numerically the cosmic errors, 
cross-correlations and biases for factorial moments and cumulants. 

Al The Finite Volume Error for Factorial Moments in PT and E 2 PT framework 

The bivariate generating function for counts in cells employed in SC had to be generalized to incorporate PT. This gener- 
alization can be used for most other models, including SS and BeS. The explicit results from this formalism are presented 
next: 

An = ~N 2 £(L), (A3) 
A£, = 47V 4 £(L) (l + 2?Qi 2 +f Q22), (A4) 

A33 =9iV 6 !(L) (l + 2f + f +4?Qi2+4? 2 Qi2 + 6f Q13+6? 3 Qi 3 +4f Q22 + nf Q 23 +9f Q33) . (AS) 
The quantity f (L) is roughly the average of the two-point correlation function over the survey volume: 

f(£)=l / d 3 r 1 d 3 r 2 C(|r 1 -r 2 |). (A6) 

V J \ ri -r 2 \>2e 

To leading order in v/V this integral reads (Colombi et al. 1999a) 



(A7) 



d 3 rid 3 r 2 £(|ri-r 2 |), (A8) 

47rr 2 dr£(r). (A9) 

For most practical cases, the term proportional to £j(2^) can be neglected and the integral can be performed on the sample 
volume V instead of the volume covered by the cells included in the catalog, V: £(L) ~ £ (L). If kept, the correction 
8v^ 1 (2£)/V, which can be viewed as an "edge-finite volume effect", yields usually a small correction compared to the edge 
effect errors (see Colombi et al. 1999a,b for practical examples). 

In the PT framework, the cumulants factor ize Qki = QkiQn- Each Qki depends on logarithmic derivatives 7^ = — rij — 3 







with 













of the (linear) variance, £, with respect to scale (Bernardeau 1996a). Note that in the E PT framework, the nonlinear variance 



£ is taken. The parameter 71 is adjusted such that S3 = 3Q3 = 34/7 + 71 fits the measured, nonlinear skewness. Higher order 
statistics and bivariate statistics are then derived from PT expressions with t his value of 71 (and 7, = 0, j > 2). A detailed 



numerical investigation of E 2 PT for the cosmic errors can be found elsewhere (Colombi et al. 1999b) 

The above results can represent the SS model as well by replacing Qki with Qk+i- In the BeS framework, similarly as in 
PT, the relation Qki = QkiQn holds. In that case the Qki can be computed explicitly from the vertex generating function 
as combinations of Qi, I < k + 1 (See BeS and SC for details). Corresponding analytic expressions of the finite volume error 
can be found in SC. 

Note finally that for the BeS and PT models, because of the factorization properties (|3l|) and (|33|), we have 
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N k+i £{L) Af fc+1 C(i) V +1 C(L)' 
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(A10) 



A2 The Cosmic Cross-Correlations for Factorial Moments 

The explicit formulae of the cosmic cross-correlations presented next complete the cosmic cross-correlation matrix. They 
provide the full statistical description to second order and can be used both for maximum likelihood analysis, and for 
calculating the cross-correlation matrix of any estimator related to factorial moments with the method presented in the main 



text. 

A F l2 = 2N S l{L) (l +f Qia) , (All) 

Af 2 = N 3 Zy (8.525 + 11.42?Q 3 ) , (A12) 

A? 2 =N 2 ^ (2.0 + 1.478?) , (A13) 

Af s = 3N 4 aL) (n-I + 2lOia + 3| a Qis), (A14) 

Af 3 (9.05 + 11.42?+ 21.67?Q 3 + 42.24? 2 Q 4 ) , (A15) 

A? 3 = 'N 3 ^7 (3.0 + 6.653 1 + 4.949 f Q 3 ) , (A16) 

A2 3 = 6iV 5 |(L) (l+? + 3?Q 12 + 3? 2 Q 13 +? 2 Qi 2 + 2? 2 Q 22 +3? 3 Q 23 ) , (A17) 

Am = N 5 ^ (23.08 + 33.09?+90.17?Q 3 + 55.19? 2 Q 3 + 211.2? 2 Q 4 + 229.9? 3 Q 5 ) , (A18) 

A? 3 = iV 3 ^ (l.943 + 6. iV + 4.522? + 26.61 iV? + 9.898 iV? 2 + 3.531 ? 2 Q 3 + 39.59 iV? 2 Q 3 + 39.53 7V? 3 Q 4 ) . (A19) 

In the above equations the edge and discreteness effect contribution was calculated from a locally Poisson Ansatz. On scales 

smaller than twice the cell size the two-point correlation function is assumed to be a power law £(r) oc r ' with 7 = 1.8. 



Detailed investigation of SC showed that variations of 7 affect insignificantly the coefficients in the above equations. Therefore 
these equations are valid even when £ departs weakly from a strict power-law. 



APPENDIX B: THE COSMIC BIAS: COMPARISON WITH HG 
Bl The cosmic bias on £: detailed analysis 

Within the theoretical framework of this article, the cosmic bias on £ can be expressed in terms of A k i (defined in Appendix A): 
h = ^5 (Mil - 2*ia) ■ (Bl) 
with 

S kl ee ||. (B2) 

Using the analytic results in Appendix A and assuming E 2 PT, the cosmic bias can be written to leading order in v/V as 
b^=b +b E + b F , (B3) 
where the discreteness, edge, and finite volume effects are, respectively, 



1 



v 



f , ; , j _ 1|5 + 16-5 _ i 8 .5Q 3 ^ IE (B5) 
6 F ^ I -= + 3- 2Q12 j £{L). (B6) 



The result of HG is the following 

bj= (-| + 3-2Q 12 )? 2 , (B7) 
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with 



£2 = I d 3 rid 3 r 2 £(| / i - r-,\ 



d 3 a;id 3 a;25(|xi 



X2\ 



(B8) 



(B9) 



The integral in the above equation is performed over two cells with volumes v%, V2 separated by distance r. Thus the calculation 
of HG drops discreteness effects, claiming that they can be neglected since v/(NV) = 1/Nl is small. In contrast, SC have 
shown that terms proportional to 1/Nl are dominating the cosmic error on small scales. This may be true in principle for the 
cosmic bias as well. Equation (B4), however, shows that discreteness effects are indeed negligible, unless £ ~ 1/Nl- Note that 
the same argument is invalid for higher order cumulants such as Q3 and Q4: there discreteness effects can induce a significant 
contribution to bias, particularly on small scales (see the example bel ow) . 

The calculation of HG includes edge effects through the integral (B8) over the volume V covered by cells included in the 



catalog. Following SC one can split integral (B8) into two contributions according to whether the cells overlap or not 



6 



1 



+ 



•1-7-2 I >2£ 



2 l<2£ 



(BIO) 



While it would be superfluous here to enter into details of this somewhat tedious calculation, it is clear, as in SC, that the 
overlapping term will typically yield a contribution b E proportional to £v/V. On the other hand, disjoint cells contribute 
approximately of order £(L) = £ (L) — 8(v /V)£, 1 (2£) . [This reasoning is valid to leading order in v/V . Higher order cor rections 
proportional to the perimeter of the survey must be taken into account for more accuracy (Colombi et al. 1999a)]. Since 
the correction proportional to £j might exactly compensate for the term &e introduced by overlapping cells, HG argue that 
£2 — £ (L), suggesting exact cancellation. Our calculations based on local Poisson approximation indeed show that &f/£(L) 
is of same order of b^/ [(8v /V)^ 1 (2£)] for the particular case of £. This result does not hold, however, for cumulants of higher 
order, where edge effects are dominant on large scales. At this level of accuracy our calculation becomes approximate as well 
mainly because of the local Poisson assumption (Colombi et al. 1999b), therefore it is impossible to evaluate the residual edge 
effects for £ in this framework. 



B2 The cosmic bias on higher order statistics 

A simple algebraic calculation of the cosmic bias on Q3 = 53/3 yields 

6 Q 3 = ^ -3^-2523+3522, (Bll) 
with 

br = =4\(6rJn -3<5i 3 )-3=^-(3<5ii-2<5 12 ). (B12) 



Explicit writing of the discreteness contribution in equation (Bll), although trivial, would go beyond the scope of this paper 



To illustrate that it is not negligible, numerical results are given next. For £ = l/i -1 Mpc, bj ~ —5 x 10 -5 , bg 3 = —2 x 10 -4 
in the standard SDSS-like catalog of CSS. After a dilution by a factor 100 (which means that the catalog would still contain 
~ 8000 objects, e.g. CSS), these terms become bj = —4 x 10~ 5 , a small change as expected, and 6q 3 = —0.2, a change by three 
orders of magnitude. This means that discreteness effects can have a significant contribution to the bias on small scales, in 
contrast with the claims of HG. The accuracy of this statement is limited by the local Poisson assumption, which is, however, 
increasingly more precise as the the sample becomes more and more diluted. 
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